Generalization Error of Combined Classifiers
نویسندگان
چکیده
We derive an upper bound on the generalization error of classi ers which can be represented as thresholded convex combinations of thresholded convex combinations of functions. Such classi ers include single hidden-layer threshold networks and voted combinations of decision trees (such as those produced by boosting algorithms). The derived bound depends on the proportion of training examples with margin less than some threshold and the average complexity of the combined functions (where the average is over the weights assigned to each function in the convex combination). The complexity of the individual functions in the combination depends on their closeness to threshold. By representing a decision tree as a thresholded convex combination of weighted leaf functions, we apply this result to bound the generalization error of combinations of decision trees. Previous bounds depend on the margin of the combined classi er and the average complexity of the decision trees in the combination, where the complexity of each decision tree depends on the total number of leaves. Our bound also depends on the margin of the combined classi er and the average complexity of the decision trees, but our measure of complexity for an individual decision tree is based on the distribution of training examples over leaves and can be signi cantly smaller than the total number of leaves.
منابع مشابه
Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers
We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods ...
متن کاملBounding the Generalization Error of Neural Networks and Combined Classifiers
Recently, several authors developed a new approach to bounding the generalization error of complex classi-ers (of large or even innnite VC-dimension) obtained by combining simpler classiiers. The new bounds are in terms of the distributions of the margin of combined classiiers and they provide some theoretical explanation of generalization performance of large neu-We obtained new probabilistic ...
متن کاملImproved Boosting Algorithm Using Combined Weak Classifiers
From family of corrective boosting algorithms (i.e. AdaBoost, LogitBoost) to total corrective algorithms (i.e. LPBoost, TotalBoost, SoftBoost, ERLPBoost), we analysis these methods of sample weight updating. Corrective boosting algorithms update the sample weight according to the last hypothesis; comparatively, total corrective algorithms update the weight with the best one of all weak classifi...
متن کاملDirect Optimization of Margins Improves Generalization in Combined Classifiers
Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices significant training error for improved test error (horizontal marks on margin= 0 line)_ -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Margin
متن کاملGeneralization error bounds for classifiers trained with interdependent data
In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be dependent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Comput. Syst. Sci.
دوره 65 شماره
صفحات -
تاریخ انتشار 2002